Overview

Brought to you by YData

Dataset statistics

Number of variables17
Number of observations5780
Missing cells111
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.6 MiB
Average record size in memory660.9 B

Variable types

Numeric8
DateTime2
Categorical5
Text2

Alerts

Unnamed: 0 is highly overall correlated with colorHigh correlation
color is highly overall correlated with Unnamed: 0 and 2 other fieldsHigh correlation
distance is highly overall correlated with fare and 2 other fieldsHigh correlation
dropoff_borough is highly overall correlated with color and 1 other fieldsHigh correlation
fare is highly overall correlated with distance and 2 other fieldsHigh correlation
log_tip is highly overall correlated with payment and 1 other fieldsHigh correlation
log_total is highly overall correlated with distance and 2 other fieldsHigh correlation
payment is highly overall correlated with log_tip and 1 other fieldsHigh correlation
pickup_borough is highly overall correlated with color and 1 other fieldsHigh correlation
tip is highly overall correlated with log_tip and 1 other fieldsHigh correlation
total is highly overall correlated with distance and 2 other fieldsHigh correlation
tolls is highly imbalanced (97.9%)Imbalance
pickup_borough is highly imbalanced (62.8%)Imbalance
dropoff_borough is highly imbalanced (59.6%)Imbalance
Unnamed: 0 is uniformly distributedUniform
Unnamed: 0 has unique valuesUnique
passengers has 85 (1.5%) zerosZeros
tip has 2070 (35.8%) zerosZeros
log_tip has 2070 (35.8%) zerosZeros

Reproduction

Analysis started2024-09-29 00:18:29.677755
Analysis finished2024-09-29 00:18:51.419587
Duration21.74 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

HIGH CORRELATION  UNIFORM  UNIQUE 

Distinct5780
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3207.3581
Minimum0
Maximum6432
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:18:51.586485image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile324.95
Q11590.75
median3206.5
Q34813.25
95-th percentile6108.05
Maximum6432
Range6432
Interquartile range (IQR)3222.5

Descriptive statistics

Standard deviation1857.1615
Coefficient of variation (CV)0.57903155
Kurtosis-1.2022528
Mean3207.3581
Median Absolute Deviation (MAD)1613
Skewness0.004130154
Sum18538530
Variance3449049
MonotonicityStrictly increasing
2024-09-29T00:18:51.868340image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1
 
< 0.1%
4290 1
 
< 0.1%
4288 1
 
< 0.1%
4287 1
 
< 0.1%
4285 1
 
< 0.1%
4284 1
 
< 0.1%
4283 1
 
< 0.1%
4282 1
 
< 0.1%
4281 1
 
< 0.1%
4279 1
 
< 0.1%
Other values (5770) 5770
99.8%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
6432 1
< 0.1%
6431 1
< 0.1%
6430 1
< 0.1%
6428 1
< 0.1%
6427 1
< 0.1%
6426 1
< 0.1%
6425 1
< 0.1%
6424 1
< 0.1%
6423 1
< 0.1%
6422 1
< 0.1%

pickup
Date

Distinct5767
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size219.4 KiB
Minimum2019-02-28 23:29:03
Maximum2019-03-31 23:15:03
2024-09-29T00:18:52.242400image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:52.550456image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct5774
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size219.4 KiB
Minimum2019-02-28 23:32:35
Maximum2019-03-31 23:27:12
2024-09-29T00:18:52.824234image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:53.139992image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passengers
Real number (ℝ)

ZEROS 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5401384
Minimum0
Maximum6
Zeros85
Zeros (%)1.5%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:18:53.378795image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.2048877
Coefficient of variation (CV)0.78232427
Kurtosis4.8353555
Mean1.5401384
Median Absolute Deviation (MAD)0
Skewness2.3523074
Sum8902
Variance1.4517543
MonotonicityNot monotonic
2024-09-29T00:18:53.575637image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 4207
72.8%
2 783
 
13.5%
5 254
 
4.4%
3 215
 
3.7%
6 135
 
2.3%
4 101
 
1.7%
0 85
 
1.5%
ValueCountFrequency (%)
0 85
 
1.5%
1 4207
72.8%
2 783
 
13.5%
3 215
 
3.7%
4 101
 
1.7%
5 254
 
4.4%
6 135
 
2.3%
ValueCountFrequency (%)
6 135
 
2.3%
5 254
 
4.4%
4 101
 
1.7%
3 215
 
3.7%
2 783
 
13.5%
1 4207
72.8%
0 85
 
1.5%

distance
Real number (ℝ)

HIGH CORRELATION 

Distinct697
Distinct (%)12.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0542405
Minimum0
Maximum17.1
Zeros20
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:18:53.834787image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.5
Q10.93
median1.5
Q32.6
95-th percentile5.65
Maximum17.1
Range17.1
Interquartile range (IQR)1.67

Descriptive statistics

Standard deviation1.6830888
Coefficient of variation (CV)0.81932414
Kurtosis5.6583398
Mean2.0542405
Median Absolute Deviation (MAD)0.7
Skewness2.0504716
Sum11873.51
Variance2.832788
MonotonicityNot monotonic
2024-09-29T00:18:54.124502image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.8 131
 
2.3%
1 118
 
2.0%
0.9 116
 
2.0%
0.7 108
 
1.9%
1.2 107
 
1.9%
1.1 106
 
1.8%
0.6 104
 
1.8%
1.3 97
 
1.7%
1.6 89
 
1.5%
1.4 83
 
1.4%
Other values (687) 4721
81.7%
ValueCountFrequency (%)
0 20
0.3%
0.02 1
 
< 0.1%
0.09 1
 
< 0.1%
0.1 4
 
0.1%
0.12 2
 
< 0.1%
0.13 1
 
< 0.1%
0.15 2
 
< 0.1%
0.16 2
 
< 0.1%
0.17 1
 
< 0.1%
0.2 14
0.2%
ValueCountFrequency (%)
17.1 1
< 0.1%
11.93 1
< 0.1%
11.52 1
< 0.1%
11.48 1
< 0.1%
11.2 1
< 0.1%
11.19 1
< 0.1%
11.14 1
< 0.1%
11.05 1
< 0.1%
10.97 1
< 0.1%
10.94 1
< 0.1%

fare
Real number (ℝ)

HIGH CORRELATION 

Distinct116
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.19524
Minimum2.5
Maximum33.78
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:18:54.722715image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2.5
5-th percentile4.5
Q16
median8.5
Q312.5
95-th percentile21.405
Maximum33.78
Range31.28
Interquartile range (IQR)6.5

Descriptive statistics

Standard deviation5.4080392
Coefficient of variation (CV)0.53044743
Kurtosis1.9667494
Mean10.19524
Median Absolute Deviation (MAD)3
Skewness1.3708082
Sum58928.49
Variance29.246888
MonotonicityNot monotonic
2024-09-29T00:18:55.034393image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.5 349
 
6.0%
6 345
 
6.0%
6.5 323
 
5.6%
5 313
 
5.4%
5.5 312
 
5.4%
7 296
 
5.1%
8 258
 
4.5%
8.5 254
 
4.4%
9.5 229
 
4.0%
4.5 217
 
3.8%
Other values (106) 2884
49.9%
ValueCountFrequency (%)
2.5 9
 
0.2%
3 21
 
0.4%
3.5 91
 
1.6%
4 154
2.7%
4.5 217
3.8%
5 313
5.4%
5.5 312
5.4%
6 345
6.0%
6.5 323
5.6%
7 296
5.1%
ValueCountFrequency (%)
33.78 1
 
< 0.1%
33.67 1
 
< 0.1%
33.65 1
 
< 0.1%
33.54 1
 
< 0.1%
33.5 3
0.1%
33.27 1
 
< 0.1%
33.22 1
 
< 0.1%
33.01 1
 
< 0.1%
33 4
0.1%
32.5 2
< 0.1%

tip
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct305
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5851955
Minimum0
Maximum6.82
Zeros2070
Zeros (%)35.8%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:18:55.335538image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1.65
Q32.56
95-th percentile4.35
Maximum6.82
Range6.82
Interquartile range (IQR)2.56

Descriptive statistics

Standard deviation1.4900555
Coefficient of variation (CV)0.93998215
Kurtosis-0.34465861
Mean1.5851955
Median Absolute Deviation (MAD)1.53
Skewness0.59670723
Sum9162.43
Variance2.2202653
MonotonicityNot monotonic
2024-09-29T00:18:55.635537image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2070
35.8%
1 310
 
5.4%
2 205
 
3.5%
2.16 89
 
1.5%
1.86 79
 
1.4%
2.36 74
 
1.3%
2.26 74
 
1.3%
1.96 73
 
1.3%
3 71
 
1.2%
1.5 71
 
1.2%
Other values (295) 2664
46.1%
ValueCountFrequency (%)
0 2070
35.8%
0.01 6
 
0.1%
0.02 2
 
< 0.1%
0.08 1
 
< 0.1%
0.09 1
 
< 0.1%
0.1 1
 
< 0.1%
0.2 1
 
< 0.1%
0.25 1
 
< 0.1%
0.37 1
 
< 0.1%
0.39 1
 
< 0.1%
ValueCountFrequency (%)
6.82 1
 
< 0.1%
6.8 1
 
< 0.1%
6.7 1
 
< 0.1%
6.62 1
 
< 0.1%
6.58 2
< 0.1%
6.55 2
< 0.1%
6.54 1
 
< 0.1%
6.51 1
 
< 0.1%
6.45 1
 
< 0.1%
6.39 3
0.1%

tolls
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size512.9 KiB
0.0
5761 
5.76
 
18
5.54
 
1

Length

Max length4
Median length3
Mean length3.0032872
Min length3

Characters and Unicode

Total characters17359
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 5761
99.7%
5.76 18
 
0.3%
5.54 1
 
< 0.1%

Length

2024-09-29T00:18:56.110201image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-29T00:18:56.558831image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 5761
99.7%
5.76 18
 
0.3%
5.54 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 11522
66.4%
. 5780
33.3%
5 20
 
0.1%
7 18
 
0.1%
6 18
 
0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17359
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 11522
66.4%
. 5780
33.3%
5 20
 
0.1%
7 18
 
0.1%
6 18
 
0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17359
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 11522
66.4%
. 5780
33.3%
5 20
 
0.1%
7 18
 
0.1%
6 18
 
0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17359
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 11522
66.4%
. 5780
33.3%
5 20
 
0.1%
7 18
 
0.1%
6 18
 
0.1%
4 1
 
< 0.1%

total
Real number (ℝ)

HIGH CORRELATION 

Distinct539
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.967836
Minimum4.8
Maximum34.55
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:18:56.979832image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum4.8
5-th percentile7.3
Q110.56
median13.55
Q318.3
95-th percentile27.36
Maximum34.55
Range29.75
Interquartile range (IQR)7.74

Descriptive statistics

Standard deviation6.0366074
Coefficient of variation (CV)0.4033053
Kurtosis0.66567905
Mean14.967836
Median Absolute Deviation (MAD)3.59
Skewness1.0083505
Sum86514.09
Variance36.440629
MonotonicityNot monotonic
2024-09-29T00:18:57.447461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.3 146
 
2.5%
11.3 129
 
2.2%
11.8 122
 
2.1%
10.3 121
 
2.1%
9.8 121
 
2.1%
8.8 118
 
2.0%
12.3 101
 
1.7%
8.3 100
 
1.7%
10.8 98
 
1.7%
12.8 95
 
1.6%
Other values (529) 4629
80.1%
ValueCountFrequency (%)
4.8 19
0.3%
4.81 1
 
< 0.1%
5 2
 
< 0.1%
5.28 1
 
< 0.1%
5.3 17
0.3%
5.38 1
 
< 0.1%
5.55 1
 
< 0.1%
5.76 1
 
< 0.1%
5.8 38
0.7%
6 2
 
< 0.1%
ValueCountFrequency (%)
34.55 2
 
< 0.1%
34.51 1
 
< 0.1%
34.3 10
0.2%
34.28 1
 
< 0.1%
34.27 1
 
< 0.1%
34.26 1
 
< 0.1%
34.17 1
 
< 0.1%
34.15 1
 
< 0.1%
34.12 1
 
< 0.1%
34.1 1
 
< 0.1%

color
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size529.0 KiB
yellow
4914 
green
866 

Length

Max length6
Median length6
Mean length5.850173
Min length5

Characters and Unicode

Total characters33814
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowyellow
2nd rowyellow
3rd rowyellow
4th rowyellow
5th rowyellow

Common Values

ValueCountFrequency (%)
yellow 4914
85.0%
green 866
 
15.0%

Length

2024-09-29T00:18:57.849194image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-29T00:18:58.262542image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
yellow 4914
85.0%
green 866
 
15.0%

Most occurring characters

ValueCountFrequency (%)
l 9828
29.1%
e 6646
19.7%
y 4914
14.5%
o 4914
14.5%
w 4914
14.5%
g 866
 
2.6%
r 866
 
2.6%
n 866
 
2.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 33814
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 9828
29.1%
e 6646
19.7%
y 4914
14.5%
o 4914
14.5%
w 4914
14.5%
g 866
 
2.6%
r 866
 
2.6%
n 866
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 33814
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 9828
29.1%
e 6646
19.7%
y 4914
14.5%
o 4914
14.5%
w 4914
14.5%
g 866
 
2.6%
r 866
 
2.6%
n 866
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 33814
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 9828
29.1%
e 6646
19.7%
y 4914
14.5%
o 4914
14.5%
w 4914
14.5%
g 866
 
2.6%
r 866
 
2.6%
n 866
 
2.6%

payment
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing37
Missing (%)0.6%
Memory size546.0 KiB
credit card
4050 
cash
1693 

Length

Max length11
Median length11
Mean length8.9364444
Min length4

Characters and Unicode

Total characters51322
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcredit card
2nd rowcash
3rd rowcredit card
4th rowcredit card
5th rowcredit card

Common Values

ValueCountFrequency (%)
credit card 4050
70.1%
cash 1693
29.3%
(Missing) 37
 
0.6%

Length

2024-09-29T00:18:58.692691image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-29T00:18:59.168804image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
credit 4050
41.4%
card 4050
41.4%
cash 1693
17.3%

Most occurring characters

ValueCountFrequency (%)
c 9793
19.1%
r 8100
15.8%
d 8100
15.8%
a 5743
11.2%
e 4050
7.9%
i 4050
7.9%
t 4050
7.9%
4050
7.9%
s 1693
 
3.3%
h 1693
 
3.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 51322
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
c 9793
19.1%
r 8100
15.8%
d 8100
15.8%
a 5743
11.2%
e 4050
7.9%
i 4050
7.9%
t 4050
7.9%
4050
7.9%
s 1693
 
3.3%
h 1693
 
3.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 51322
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
c 9793
19.1%
r 8100
15.8%
d 8100
15.8%
a 5743
11.2%
e 4050
7.9%
i 4050
7.9%
t 4050
7.9%
4050
7.9%
s 1693
 
3.3%
h 1693
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 51322
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
c 9793
19.1%
r 8100
15.8%
d 8100
15.8%
a 5743
11.2%
e 4050
7.9%
i 4050
7.9%
t 4050
7.9%
4050
7.9%
s 1693
 
3.3%
h 1693
 
3.3%
Distinct178
Distinct (%)3.1%
Missing16
Missing (%)0.3%
Memory size587.2 KiB
2024-09-29T00:18:59.624279image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length33
Median length28
Mean length16.275156
Min length4

Characters and Unicode

Total characters93810
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)0.6%

Sample

1st rowLenox Hill West
2nd rowUpper West Side South
3rd rowAlphabet City
4th rowMidtown East
5th rowTimes Sq/Theatre District
ValueCountFrequency (%)
east 1633
 
11.4%
west 1011
 
7.0%
south 773
 
5.4%
north 772
 
5.4%
side 714
 
5.0%
midtown 668
 
4.6%
upper 617
 
4.3%
village 495
 
3.4%
sq 418
 
2.9%
hill 413
 
2.9%
Other values (183) 6856
47.7%
2024-09-29T00:19:00.383065image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 8693
 
9.3%
8606
 
9.2%
e 7849
 
8.4%
i 6110
 
6.5%
a 5874
 
6.3%
r 5420
 
5.8%
o 5227
 
5.6%
n 5132
 
5.5%
s 4661
 
5.0%
l 4385
 
4.7%
Other values (44) 31853
34.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 93810
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 8693
 
9.3%
8606
 
9.2%
e 7849
 
8.4%
i 6110
 
6.5%
a 5874
 
6.3%
r 5420
 
5.8%
o 5227
 
5.6%
n 5132
 
5.5%
s 4661
 
5.0%
l 4385
 
4.7%
Other values (44) 31853
34.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 93810
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 8693
 
9.3%
8606
 
9.2%
e 7849
 
8.4%
i 6110
 
6.5%
a 5874
 
6.3%
r 5420
 
5.8%
o 5227
 
5.6%
n 5132
 
5.5%
s 4661
 
5.0%
l 4385
 
4.7%
Other values (44) 31853
34.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 93810
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 8693
 
9.3%
8606
 
9.2%
e 7849
 
8.4%
i 6110
 
6.5%
a 5874
 
6.3%
r 5420
 
5.8%
o 5227
 
5.6%
n 5132
 
5.5%
s 4661
 
5.0%
l 4385
 
4.7%
Other values (44) 31853
34.0%
Distinct196
Distinct (%)3.4%
Missing21
Missing (%)0.4%
Memory size587.3 KiB
2024-09-29T00:19:00.859300image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length35
Median length29
Mean length16.331134
Min length4

Characters and Unicode

Total characters94051
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)0.4%

Sample

1st rowUN/Turtle Bay South
2nd rowUpper West Side South
3rd rowWest Village
4th rowYorkville West
5th rowMidtown East
ValueCountFrequency (%)
east 1543
 
10.7%
west 1015
 
7.0%
north 874
 
6.0%
south 752
 
5.2%
side 751
 
5.2%
upper 658
 
4.6%
midtown 586
 
4.1%
hill 515
 
3.6%
village 468
 
3.2%
sq 308
 
2.1%
Other values (204) 6988
48.3%
2024-09-29T00:19:01.641230image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8699
 
9.2%
t 8510
 
9.0%
e 7980
 
8.5%
i 5952
 
6.3%
a 5951
 
6.3%
r 5683
 
6.0%
o 5033
 
5.4%
n 4745
 
5.0%
s 4661
 
5.0%
l 4627
 
4.9%
Other values (44) 32210
34.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 94051
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
8699
 
9.2%
t 8510
 
9.0%
e 7980
 
8.5%
i 5952
 
6.3%
a 5951
 
6.3%
r 5683
 
6.0%
o 5033
 
5.4%
n 4745
 
5.0%
s 4661
 
5.0%
l 4627
 
4.9%
Other values (44) 32210
34.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 94051
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
8699
 
9.2%
t 8510
 
9.0%
e 7980
 
8.5%
i 5952
 
6.3%
a 5951
 
6.3%
r 5683
 
6.0%
o 5033
 
5.4%
n 4745
 
5.0%
s 4661
 
5.0%
l 4627
 
4.9%
Other values (44) 32210
34.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 94051
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
8699
 
9.2%
t 8510
 
9.0%
e 7980
 
8.5%
i 5952
 
6.3%
a 5951
 
6.3%
r 5683
 
6.0%
o 5033
 
5.4%
n 4745
 
5.0%
s 4661
 
5.0%
l 4627
 
4.9%
Other values (44) 32210
34.2%

pickup_borough
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing16
Missing (%)0.3%
Memory size544.9 KiB
Manhattan
5001 
Queens
 
359
Brooklyn
 
329
Bronx
 
75

Length

Max length9
Median length9
Mean length8.704025
Min length5

Characters and Unicode

Total characters50170
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowManhattan
2nd rowManhattan
3rd rowManhattan
4th rowManhattan
5th rowManhattan

Common Values

ValueCountFrequency (%)
Manhattan 5001
86.5%
Queens 359
 
6.2%
Brooklyn 329
 
5.7%
Bronx 75
 
1.3%
(Missing) 16
 
0.3%

Length

2024-09-29T00:19:01.952359image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-29T00:19:02.218258image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
manhattan 5001
86.8%
queens 359
 
6.2%
brooklyn 329
 
5.7%
bronx 75
 
1.3%

Most occurring characters

ValueCountFrequency (%)
a 15003
29.9%
n 10765
21.5%
t 10002
19.9%
M 5001
 
10.0%
h 5001
 
10.0%
o 733
 
1.5%
e 718
 
1.4%
B 404
 
0.8%
r 404
 
0.8%
Q 359
 
0.7%
Other values (6) 1780
 
3.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 50170
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 15003
29.9%
n 10765
21.5%
t 10002
19.9%
M 5001
 
10.0%
h 5001
 
10.0%
o 733
 
1.5%
e 718
 
1.4%
B 404
 
0.8%
r 404
 
0.8%
Q 359
 
0.7%
Other values (6) 1780
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 50170
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 15003
29.9%
n 10765
21.5%
t 10002
19.9%
M 5001
 
10.0%
h 5001
 
10.0%
o 733
 
1.5%
e 718
 
1.4%
B 404
 
0.8%
r 404
 
0.8%
Q 359
 
0.7%
Other values (6) 1780
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 50170
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 15003
29.9%
n 10765
21.5%
t 10002
19.9%
M 5001
 
10.0%
h 5001
 
10.0%
o 733
 
1.5%
e 718
 
1.4%
B 404
 
0.8%
r 404
 
0.8%
Q 359
 
0.7%
Other values (6) 1780
 
3.5%

dropoff_borough
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing21
Missing (%)0.4%
Memory size544.7 KiB
Manhattan
4914 
Brooklyn
 
383
Queens
 
363
Bronx
 
99

Length

Max length9
Median length9
Mean length8.6756381
Min length5

Characters and Unicode

Total characters49963
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowManhattan
2nd rowManhattan
3rd rowManhattan
4th rowManhattan
5th rowManhattan

Common Values

ValueCountFrequency (%)
Manhattan 4914
85.0%
Brooklyn 383
 
6.6%
Queens 363
 
6.3%
Bronx 99
 
1.7%
(Missing) 21
 
0.4%

Length

2024-09-29T00:19:02.488861image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-09-29T00:19:02.759614image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
manhattan 4914
85.3%
brooklyn 383
 
6.7%
queens 363
 
6.3%
bronx 99
 
1.7%

Most occurring characters

ValueCountFrequency (%)
a 14742
29.5%
n 10673
21.4%
t 9828
19.7%
M 4914
 
9.8%
h 4914
 
9.8%
o 865
 
1.7%
e 726
 
1.5%
B 482
 
1.0%
r 482
 
1.0%
k 383
 
0.8%
Other values (6) 1954
 
3.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 49963
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 14742
29.5%
n 10673
21.4%
t 9828
19.7%
M 4914
 
9.8%
h 4914
 
9.8%
o 865
 
1.7%
e 726
 
1.5%
B 482
 
1.0%
r 482
 
1.0%
k 383
 
0.8%
Other values (6) 1954
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 49963
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 14742
29.5%
n 10673
21.4%
t 9828
19.7%
M 4914
 
9.8%
h 4914
 
9.8%
o 865
 
1.7%
e 726
 
1.5%
B 482
 
1.0%
r 482
 
1.0%
k 383
 
0.8%
Other values (6) 1954
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 49963
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 14742
29.5%
n 10673
21.4%
t 9828
19.7%
M 4914
 
9.8%
h 4914
 
9.8%
o 865
 
1.7%
e 726
 
1.5%
B 482
 
1.0%
r 482
 
1.0%
k 383
 
0.8%
Other values (6) 1954
 
3.9%

log_tip
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct305
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.76578564
Minimum0
Maximum2.0566846
Zeros2070
Zeros (%)35.8%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:19:03.005033image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.97455964
Q31.2697605
95-th percentile1.6770966
Maximum2.0566846
Range2.0566846
Interquartile range (IQR)1.2697605

Descriptive statistics

Standard deviation0.62691948
Coefficient of variation (CV)0.81866184
Kurtosis-1.4747582
Mean0.76578564
Median Absolute Deviation (MAD)0.47235934
Skewness-0.11079472
Sum4426.241
Variance0.39302804
MonotonicityNot monotonic
2024-09-29T00:19:03.294698image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2070
35.8%
0.6931471806 310
 
5.4%
1.098612289 205
 
3.5%
1.150572028 89
 
1.5%
1.050821625 79
 
1.4%
1.211940974 74
 
1.3%
1.181727195 74
 
1.3%
1.085189268 73
 
1.3%
1.386294361 71
 
1.2%
0.9162907319 71
 
1.2%
Other values (295) 2664
46.1%
ValueCountFrequency (%)
0 2070
35.8%
0.009950330853 6
 
0.1%
0.0198026273 2
 
< 0.1%
0.07696104114 1
 
< 0.1%
0.08617769624 1
 
< 0.1%
0.0953101798 1
 
< 0.1%
0.1823215568 1
 
< 0.1%
0.2231435513 1
 
< 0.1%
0.3148107398 1
 
< 0.1%
0.3293037471 1
 
< 0.1%
ValueCountFrequency (%)
2.056684555 1
 
< 0.1%
2.054123734 1
 
< 0.1%
2.041220329 1
 
< 0.1%
2.03077637 1
 
< 0.1%
2.0255132 2
< 0.1%
2.021547563 2
< 0.1%
2.020222182 1
 
< 0.1%
2.016235466 1
 
< 0.1%
2.008214032 1
 
< 0.1%
2.000127735 3
0.1%

log_total
Real number (ℝ)

HIGH CORRELATION 

Distinct539
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7042991
Minimum1.7578579
Maximum3.5709402
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size219.4 KiB
2024-09-29T00:19:03.585849image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1.7578579
5-th percentile2.1162555
Q12.4475509
median2.677591
Q32.9601051
95-th percentile3.3449797
Maximum3.5709402
Range1.8130822
Interquartile range (IQR)0.51255423

Descriptive statistics

Standard deviation0.36131995
Coefficient of variation (CV)0.13360947
Kurtosis-0.34760902
Mean2.7042991
Median Absolute Deviation (MAD)0.25278827
Skewness0.16656486
Sum15630.849
Variance0.13055211
MonotonicityNot monotonic
2024-09-29T00:19:03.880929image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.332143895 146
 
2.5%
2.509599262 129
 
2.2%
2.549445171 122
 
2.1%
2.424802726 121
 
2.1%
2.379546134 121
 
2.1%
2.282382386 118
 
2.0%
2.587764035 101
 
1.7%
2.2300144 100
 
1.7%
2.468099531 98
 
1.7%
2.624668592 95
 
1.6%
Other values (529) 4629
80.1%
ValueCountFrequency (%)
1.757857918 19
0.3%
1.759580571 1
 
< 0.1%
1.791759469 2
 
< 0.1%
1.83736998 1
 
< 0.1%
1.840549633 17
0.3%
1.853168097 1
 
< 0.1%
1.87946505 1
 
< 0.1%
1.91102289 1
 
< 0.1%
1.916922612 38
0.7%
1.945910149 2
 
< 0.1%
ValueCountFrequency (%)
3.570940156 2
 
< 0.1%
3.569814347 1
 
< 0.1%
3.563882964 10
0.2%
3.563316231 1
 
< 0.1%
3.563032744 1
 
< 0.1%
3.562749177 1
 
< 0.1%
3.560193446 1
 
< 0.1%
3.559624618 1
 
< 0.1%
3.558770769 1
 
< 0.1%
3.55820113 1
 
< 0.1%

Interactions

2024-09-29T00:18:48.022548image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:31.348463image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:34.104272image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:36.174998image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:38.475596image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:40.512799image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:42.933954image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:45.777390image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:48.280401image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:31.699243image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:34.360677image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:36.446022image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:38.721261image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:40.764463image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:43.268743image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:46.022930image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:48.536379image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:32.565708image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:34.607524image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:36.693107image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:38.967141image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:41.021277image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:43.600703image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:46.278604image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:48.808265image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:32.826889image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:34.867741image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:36.951128image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:39.246690image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:41.285964image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:43.975788image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:46.766559image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:49.066016image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:33.092735image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:35.143383image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:37.235163image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:39.504324image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:41.548664image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:44.398533image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:47.012718image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:49.332819image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:33.358723image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:35.404205image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:37.509643image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:39.758897image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:41.873428image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:44.814185image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:47.281774image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:49.585241image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:33.604491image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:35.665201image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:37.755870image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:40.001494image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:42.243417image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:45.164632image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:47.525533image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:49.837504image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:33.842478image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:35.907156image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:38.005958image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:40.239773image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:42.580324image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:45.495275image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-09-29T00:18:47.766111image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2024-09-29T00:19:04.141691image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Unnamed: 0colordistancedropoff_boroughfarelog_tiplog_totalpassengerspaymentpickup_boroughtiptollstotal
Unnamed: 01.0000.8970.0490.3070.029-0.146-0.094-0.0720.1090.352-0.1460.000-0.094
color0.8971.0000.1160.5980.1170.2780.2970.1440.1260.6860.2780.0330.286
distance0.0490.1161.0000.1930.9210.2330.8500.0060.0440.1570.2330.1040.850
dropoff_borough0.3070.5980.1931.0000.1620.1670.1530.0470.1490.7570.1680.0470.151
fare0.0290.1170.9210.1621.0000.2540.928-0.0010.0550.1450.2540.0920.928
log_tip-0.1460.2780.2330.1670.2541.0000.4970.0300.8690.1681.0000.0430.497
log_total-0.0940.2970.8500.1530.9280.4971.0000.0250.3230.1300.4970.1341.000
passengers-0.0720.1440.0060.047-0.0010.0300.0251.0000.0340.0520.0300.0000.025
payment0.1090.1260.0440.1490.0550.8690.3230.0341.0000.1590.8590.0000.282
pickup_borough0.3520.6860.1570.7570.1450.1680.1300.0520.1591.0000.1670.0500.131
tip-0.1460.2780.2330.1680.2541.0000.4970.0300.8590.1671.0000.0740.497
tolls0.0000.0330.1040.0470.0920.0430.1340.0000.0000.0500.0741.0000.142
total-0.0940.2860.8500.1510.9280.4971.0000.0250.2820.1310.4970.1421.000

Missing values

2024-09-29T00:18:50.250484image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2024-09-29T00:18:50.815156image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-09-29T00:18:51.222373image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0pickupdropoffpassengersdistancefaretiptollstotalcolorpaymentpickup_zonedropoff_zonepickup_boroughdropoff_boroughlog_tiplog_total
002019-03-23 20:21:092019-03-23 20:27:2411.607.02.150.012.95yellowcredit cardLenox Hill WestUN/Turtle Bay SouthManhattanManhattan1.1474022.635480
112019-03-04 16:11:552019-03-04 16:19:0010.795.00.000.09.30yellowcashUpper West Side SouthUpper West Side SouthManhattanManhattan0.0000002.332144
222019-03-27 17:53:012019-03-27 18:00:2511.377.52.360.014.16yellowcredit cardAlphabet CityWest VillageManhattanManhattan1.2119412.718660
442019-03-30 13:27:422019-03-30 13:37:1432.169.01.100.013.40yellowcredit cardMidtown EastYorkville WestManhattanManhattan0.7419372.667228
552019-03-11 10:37:232019-03-11 10:47:3110.497.52.160.012.96yellowcredit cardTimes Sq/Theatre DistrictMidtown EastManhattanManhattan1.1505722.636196
662019-03-26 21:07:312019-03-26 21:17:2913.6513.02.000.018.80yellowcredit cardBattery Park CityTwo Bridges/Seward ParkManhattanManhattan1.0986122.985682
772019-03-22 12:47:132019-03-22 12:58:1701.408.50.000.011.80yellowNaNMurray HillFlatironManhattanManhattan0.0000002.549445
882019-03-23 11:48:502019-03-23 12:06:1413.6315.01.000.019.30yellowcredit cardEast Harlem SouthMidtown CenterManhattanManhattan0.6931473.010621
992019-03-08 16:18:372019-03-08 16:26:5711.528.01.000.013.30yellowcredit cardLincoln Square EastCentral ParkManhattanManhattan0.6931472.660260
10102019-03-16 10:02:252019-03-16 10:22:2913.9017.00.000.017.80yellowcashLaGuardia AirportAstoriaQueensQueens0.0000002.933857
Unnamed: 0pickupdropoffpassengersdistancefaretiptollstotalcolorpaymentpickup_zonedropoff_zonepickup_boroughdropoff_boroughlog_tiplog_total
642264222019-03-22 20:17:352019-03-22 20:36:0714.0216.000.000.017.30greencashWashington Heights SouthSpuyten Duyvil/KingsbridgeManhattanBronx0.0000002.906901
642364232019-03-12 08:10:472019-03-12 08:35:3514.3018.500.000.019.30greencredit cardSaint AlbansHillcrest/PomonokQueensQueens0.0000003.010621
642464242019-03-30 20:52:152019-03-30 20:59:5511.708.000.000.09.30greencashCentral HarlemCentral Harlem NorthManhattanManhattan0.0000002.332144
642564252019-03-07 15:34:302019-03-07 16:31:0619.1226.320.000.026.82greencredit cardPark SlopeEast New YorkBrooklynBrooklyn0.0000003.325755
642664262019-03-28 08:04:472019-03-28 08:07:4610.714.500.500.05.80greencredit cardCentral ParkUpper West Side NorthManhattanManhattan0.4054651.916923
642764272019-03-23 18:26:092019-03-23 18:49:1217.0720.000.000.020.00greencashParkchesterEast Harlem SouthBronxManhattan0.0000003.044522
642864282019-03-31 09:51:532019-03-31 09:55:2710.754.501.060.06.36greencredit cardEast Harlem NorthCentral Harlem NorthManhattanManhattan0.7227061.996060
643064302019-03-23 22:55:182019-03-23 23:14:2514.1416.000.000.017.30greencashCrown Heights NorthBushwick NorthBrooklynBrooklyn0.0000002.906901
643164312019-03-04 10:09:252019-03-04 10:14:2911.126.000.000.06.80greencredit cardEast New YorkEast Flatbush/Remsen VillageBrooklynBrooklyn0.0000002.054124
643264322019-03-13 19:31:222019-03-13 19:48:0213.8515.003.360.020.16greencredit cardBoerum HillWindsor TerraceBrooklynBrooklyn1.4724723.052113